Trust Region Policy Optimization

نویسندگان

John Schulman

Sergey Levine

Pieter Abbeel

Michael I. Jordan

Philipp Moritz

چکیده

We describe an iterative procedure for optimizing policies, with guaranteed monotonic improvement. By making several approximations to the theoretically-justified procedure, we develop a practical algorithm, called Trust Region Policy Optimization (TRPO). This algorithm is similar to natural policy gradient methods and is effective for optimizing large nonlinear policies such as neural networks. Our experiments demonstrate its robust performance on a wide variety of tasks: learning simulated robotic swimming, hopping, and walking gaits; and playing Atari games using images of the screen as input. Despite its approximations that deviate from the theory, TRPO tends to give monotonic improvement, with little tuning of hyperparameters.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Trust-PCL: An Off-Policy Trust Region Method for Continuous Control

Trust region methods, such as TRPO, are often used to stabilize policy optimization algorithms in reinforcement learning (RL). While current trust region strategies are effective for continuous control, they typically require a prohibitively large amount of on-policy interaction with the environment. To address this problem, we propose an offpolicy trust region method, Trust-PCL. The algorithm ...

متن کامل

Trust-pcl: an Off-policy Trust Region Method for Continuous Control

Trust region methods, such as TRPO, are often used to stabilize policy optimization algorithms in reinforcement learning (RL). While current trust region strategies are effective for continuous control, they typically require a large amount of on-policy interaction with the environment. To address this problem, we propose an off-policy trust region method, Trust-PCL, which exploits an observati...

متن کامل

Trust-pcl: an Off-policy Trust Region Method for Continuous Control

Trust region methods, such as TRPO, are often used to stabilize policy optimization algorithms in reinforcement learning (RL). While current trust region strategies are effective for continuous control, they typically require a large amount of on-policy interaction with the environment. To address this problem, we propose an off-policy trust region method, Trust-PCL, which exploits an observati...

متن کامل

Stochastic Variance Reduction for Policy Gradient Estimation

Recent advances in policy gradient methods and deep learning have demonstrated their applicability for complex reinforcement learning problems. However, the variance of the performance gradient estimates obtained from the simulation is often excessive, leading to poor sample efficiency. In this paper, we apply the stochastic variance reduced gradient descent (SVRG) technique [1] to model-free p...

متن کامل

On- and Off-Policy Monotonic Policy Improvement

Monotonic policy improvement and off-policy learning are two main desirable properties for reinforcement learning algorithms. In this study, we show that the monotonic policy improvement is guaranteed from onand off-policy mixture data. Based on the theoretical result, we provide an algorithm which uses the experience replay technique for trust region policy optimization. The proposed method ca...

متن کامل

A Trust-region Method using Extended Nonmonotone Technique for Unconstrained Optimization

In this paper, we present a nonmonotone trust-region algorithm for unconstrained optimization. We first introduce a variant of the nonmonotone strategy proposed by Ahookhosh and Amini cite{AhA 01} and incorporate it into the trust-region framework to construct a more efficient approach. Our new nonmonotone strategy combines the current function value with the maximum function values in some pri...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Trust Region Policy Optimization

نویسندگان

چکیده

منابع مشابه

Trust-PCL: An Off-Policy Trust Region Method for Continuous Control

Trust-pcl: an Off-policy Trust Region Method for Continuous Control

Trust-pcl: an Off-policy Trust Region Method for Continuous Control

Stochastic Variance Reduction for Policy Gradient Estimation

On- and Off-Policy Monotonic Policy Improvement

A Trust-region Method using Extended Nonmonotone Technique for Unconstrained Optimization

عنوان ژورنال:

اشتراک گذاری